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Abstract 

Since human randomness production has been studied and widely used to 
assess executive functions (especially inhibition), many measures have been 
q ! suggested to assess the degree to which a sequence is random-like. However, 

each of them focuses on one feature of randomness, leading authors to have 
to use multiple measures. Here we describe and advocate for the use of the 
accepted universal measure for randomness based on algorithmic complexity, 
by means of a novel previously presented technique using the the definition 
of algorithmic probability. A re-analysis of the classical Radio Zenith data 
in the light of the proposed measure and methodology is provided as a study 
Sh ' case of an application. 



Keywords: algorithmic complexity, subjective randomness, complexity 
measurement 

The production of randomness by humans requires high-level cognitive 
abilities such as sustained attention and inhibition, and is impaired by poor 
working memory PJ. Unlike other frontal neuropsychological tests, random 
generation tasks possess specific features of interest: their demand on execu- 
tive functions, especially inhibition processes, is high; and more importantly, 
training does not reduce this demand through automatization. On the con- 
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trary, generating a random-like sequence requires continuous avoidance of 
any routine, thus impeding any automatized success. 

Random generation tasks have been widely used in the last decades to as- 
sess working memory, especially (sustained) inhibitive abilities |2| , in normal 
subjects as well as in patients suffering from a wide variety of pathologies. 
In normal subjects, random generation varies with personal characteristics 
or states, such as belief in the paranormal j3[ or cultural background 0, 0|. 
It affords insight into the cognitive effects of aging ((jf, hemispheric neglect 
0, schizophrenia jsf, aphasia j9j, and Down syndrome 10 . 



As a rule, random generation tasks involve generating a random-like se 



quence of digits 111 ], nouns [6|, words [12] or heads-or-tails [13(. Some authors 
have also offered a choice of more neutral items, such as dots, e.g., in the 
classical Mittenecker test (lij . Formally however, these cases all amount to 
producing sequences of bits, that is or 1 digits, since any object can be 
coded this way. In most studies, the sequence length lies between 5 and 50 
items. Measuring the "randomness" of a given short sequence (say one of 
less than 1000 items) is thus a crucial challenge. Apart from any objective 
and formal definition of randomness, authors usually use a variety of indices, 
none of which is sufficient by itself because of the profound limitations they 
all exhibit. Recently, Schulter, Mittenecker and Papousek [15| provided soft- 
ware calculating the most widely used of such measures applied to the case 
of the Mittenecker Pointing Test, and provided a comprehensive overview of 
the usual coefficients of randomness in behavioral and cognitive research. 



1. The usual measures of randomness 

The usual coefficients used to assess the quality of a pseudo-random pro- 
duction may be classified in three large varieties according to their main 
focus. 

1.1. Departure from uniformity 

The simplest coefficients - even though they may rely upon sophisticated 
theories - are based on the mere distribution of outcomes, and are therefore 
independent of the order of the outcomes. In a word, they amount to the cal- 
culation of a distance between the observed distribution and the theoretical 
flat distribution, just as a chi-square would do. 

Information theory [l6| has come a long way and is now often used as 
a ground theory for assessing randomness. Given a finite sequence s of N 
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symbols repeatedly chosen from a set of n elements, the average symbol 
information, also called entropy, is given by H(s) = ^jPilog 2 (pi), where pi 
is the relative frequency of item i in the finite sequence. This entropy is 
maximal when the relative frequencies are all equal, and then amounts to 
#max = log 2 (n). 



Symbol redundancy (SR; see 15]) is an example of a coefficient arising 
from information theory. It is defined to be SR = 1 — H/H max where H is 
the entropy and n the number of items to be chosen from at each "toss". 
SR is no more than a measure of departure from uniformity. A sequence's 
SR does not depend on all aspects of the sequence, but only on the relative 
frequencies of each item comprising it. According to SR, a sequence of 0-1 
such as 000001 is weakly random as expected, but 010101, 000111 or 100101 
are exactly equivalent to each other, since they all minimize SR to 0. This 
is the most obvious limitation of SR as a measure of randomness, as well as 
of any measure relying on the mere distribution of symbols. 

1.2. Normality 

Beyond values depending on the sole distribution of symbols, one may 
consider pairs (dyads) or sequences of three (triads) adjacent outcomes. In 
a truly (infinite) random sequence, any dyad should appear with the same 
probability. Any distance between the distribution of dyads or triads (and so 
on) and uniformity may therefore be thought of as a measure of randomness. 
This is precisely what context redundancy CR\ and coefficient of constraint 



CCi do m 



One may also consider dyads of outcomes separated by 1, 2 or k elements 
in the sequence, which is done, for instance, through CC^ and CR^ coeffi- 
cients, a generalization of CC\ and CR\. Here we group methods of this 
kind under the rubric of "normality assessment" for a reason that will soon 
become clear. 

In mathematics, a sequence of digits di, d n , with d n lying within 
[0, b] - or equivalently the real number written 0, did 2 d^... in base (b+ 1) - is 
said to be normal in base b if the asymptotic distribution of any dyad, triad, 
or of finite sequences of k consecutive digits, is uniform. Any such series also 
satisfies what may seem a stronger property: dyads of outcomes separated 
by a certain fixed number of other outcomes show, in the long run, a uniform 
distribution. Therefore, a normal sequence will be considered random by any 
normality assessment method. 



3 



Eventually, there will exist sequences produced by simple rules that are 
normal. The Coperland-Erdos sequence, arising from the concatenation of 
prime numbers (235711131719...) is an example. The Champernowne se- 



quence [17J, a concatenation of all integers from one on (123456789101112131415. 
is an even more simple example. There even exist rules to generate absolutely 
normal sequences, i.e. numbers that are normal in any base b fl8| . 

1.3. Gaps 

Another variety of randomness coefficient is worked out using the rank 
distances between two identical items. For instance, in the sequence 12311, 
the distances between occurrences of the symbol 1 are 3 and 1. The frequency 
distribution of repetition distances (gaps) and the median of repetition gap 
distribution (MdG) are based on the study of the distance between two iden- 
tical outcomes. They have proved useful in detecting the so-called cycling 
bias: people tend to repeat an item only after they have used every other 
available item once fl9| . 

Gap methods are flawed just as normality assessment is: a normal se- 
quence will pass these tests and be considered truly random, even if a naive 
rule produces it. 

1.4- Psychological justifications and limitations 

Notwithstanding their potential limitations, the coefficients mentioned 
above have proved useful in detecting some common biases in random gen- 
eration. SR-like values capture outcome biases - the over-use of certain 
symbols [ioj]. Normality assessment accurately spots alternation biases - the 
avoidance of using the same symbol twice, e.g., HH or TT - or the inverse 
repetition bias. Context redundancy has also been linked with cognitive flex- 



ibility [21|, of which it constitutes an estimate. Gaps and related methods 
would diagnose the cycling bias, a tendency to repeat the same pattern or to 
exhaust every available symbol before a repetition [l9[. For instance, if the 
available symbols are 1, 2, 3, 4, a subject might choose 1, 3, 4, 2, 1, postpon- 
ing the second appearance of "1" until after every symbol has occurred once. 
Repetition avoidance is known to affect outcomes as far as 6 ranks forward 



22( , a bias that gap methods shed light on. 

It is unclear whether these measures happen to capture the basic biases in 
human random generation, or whether, unfortunately, authors have focused 
on these biases simply because they have had tools at their disposal for 
diagnosing them. As we have seen in the previous sections, a normal sequence 
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such as that suggested by Champernowne, which is highly non-random to the 
extent that it is generated by a simplistic rule, would meet all random criteria 
using symbol distribution, normality assessment, and even gap methods. 

At this point, we may list three ways in which the usual randomness 
estimates are flawed: First, they do not capture non-standard bias in ran- 
domness, such as the existence of a simple generation rule, providing this rule 
produces sequences bearing some resemblance to a truly random one, e.g., a 
normal sequence. Only a few features of a random sequence are captured by 
these tailored measures. Second, they lack a theoretical basis. Despite being 
based on formal probabilistic properties, they nevertheless are not subsumed 
by a theory of randomness. In fact, they neither use nor provide any def- 
inition of a random sequence. Third, partly as an upshot of the first two 
points, several coefficients are needed to sketch an acceptable diagnosis of 
randomness quality, whereas a single measure would allow the comparison of 
sequences. 



2. Complexity for short sequences 

The need for a universal approach that does not focus on specific arbi- 
trary features, as well as a theoretical framework defining randomness, has 



been expressed by psychologists [23] and addressed outside psychology by 



the mathematicians Andrei Kolmogorov and Gregory Chaitin. The theory 
of algorithmic complexity (also known as Kolmogorov- Chaitin complexity) 



24| provides a formal definition of a random sequence. In this section, we 
first provide an overview of this theory, identify its limits, and then suggest 
an approach for overcoming them, following recent developments in the field 
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2.1. The theory of calculability 

When considering the complexity of an object, one may think of said ob- 
ject as simple if it can be described in a few words. One can, for example, 
describe a string of a million alternating zeros and ones 01010101... as "A 
million times 01" and say that the string is simple given its short description. 
However, it is fair to point out that the description of something is highly 
dependent on the choice of language. The strings a language can compress 
depend on the language used, since any string (even a random-looking one) 
can be encoded using a one-word long description by mapping it onto any 
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word of a suitable language. A language can always be tailor-made to de- 
scribe any given object by using something that describes said object in a 
very simple way. Due to these difficulties it wasn't until the arrival on the 
scene of the theory of computation, and the precise definition of a comput- 



ing machine by Alan Turing [26|], that the theory of algorithmic information 
found a formal framework on which it could build a definition of complexity 
and randomness. 

Today, the Turing machine model represents the basic framework under- 
lying most, if not all concepts in computer science, including the definition of 
algorithmic randomness. A Turing machine (henceforth TM) is an abstract 
model of a digital computer formalizing and defining the idea of mechanical 
calculation. 

A TM consists of a list of rules capable of manipulating a contiguous list 
of cells (usually pictured as a tape), and an access pointer (an active cell) 
equipped with a reading head. The TM can be in any one of a finite set of 
states Q, numbered from 1 to n, with 1 the state at which the machine starts 
a computation. There is a distinct n + 1 state, called the halting state, at 
which the machine halts. Each tape cell can contain a or a 1 (sometimes 
there is a special blank symbol filling the tape). Time is discrete and the 
time instants (steps) are ordered from 0, 1, . . ., with the time at which the 
machine starts its computation. At any given time, the head is positioned 
over a particular cell. The head can move right or left, reading the tape. At 
time the head is situated over a particular cell on the tape called the start 
cell, and the finite program starts in state 1. At time the content of the 
tape is called the machine input. With Turing's universal machine (today 
named a universal Turing machine, which we will denote simply by UTM) he 
also proved that programs and data don't have any particular feature that 
distinguishes them from one another, given that a program can always be 
the input of another Turing machine, and data can always be embedded in a 
program. A full description of a Turing machine can be written in a 5-tuples 
notation as follows: {si,ki,Si + l,ki + l,d}, where s« is the scanned symbol 
under the head, ki the state at time t, s i+ i the symbol to write at time t + 1, 



1 Turing's seminal contribution is the demonstration that there are Turing machines 
capable of simulating any other Turing machine [2tj ] . One does not need specific computers 
to perform specific tasks; a single programmable computer could perform any conceivable 
task (we are now capable of running a word processor on the same digital computer on 
which we play chess games). 
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k i+ i the state to be in at time t + 1 and d the head movement either to 
the right or to the left. A TM can perform the following, which defines one 
operation: (1) Write an element from A = {0, 1}. (2) Shift the head one cell 
left or right. (3) Change to state k e Q. 

When the machine is running it executes one such operation at a time 
(one every step) until it reaches the halting state- if it ever does. At the 
end of a computation the TM will have produced an output described by the 
contiguous cells of the tape over which the head passed before halting. A TM 
may or may not halt, and if it does not halt it is considered to have produced 
no output. Turing also shows that there is no procedure to determine whether 



a Turing machine will halt or not [26|. This is set forth as the undecidability 



of the halting problem identified with the common term uncomputability. 

2.2. Algorithmic complexity 

The basic idea at the root of algorithmic complexity is that a string is 
random (or complex) if it cannot be produced by a program much shorter 
in length than itself. The algorithmic complexity C(s) of a bit string s is 
formally defined as the length in bits of the shortest program that prints out 
the string running on a UTM U. Formally, C(s) = mm p {\p\ : U(p) = s} 
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One is forced to use a universal Turing machine because one wants a 
machine capable of printing out any possible string s. However, no general, 
finite and deterministic procedure exists to calculate C(s), due to Turing's 
undecidability of the halting problem. C(s) is usually approximated through 
compression algorithms, such as the Lempel-Ziv algorithm j29|. The length 
of the compressed string s is actually an upper bound of C(s). However, 
compression algorithms do not help when strings are short - shorter than, 
for example, the compression program length in bits. 

2.3. The choice of Turing machine matters 

The definition of algorithmic complexity clearly seems to depend on the 
specific UTM U, and one may ask whether there exists a different UTM 
yielding different values for C(s). The following theorem indicates that the 
definition of algorithmic complexity makes sense even if measured on dif- 
ferent universal Turing machines (or if desired, using different programming 
languages) : 

Theorem 1 (invariance). IfU and U' are two UTMs and Cu(s) and C' u {s) 
the algorithmic complexity of a binary string s using U and U' , there exists a 
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constant c that does not depend on s, such that for all binary strings \C, 



C'u{s)\<c. 



The proof of this theorem is quite straightforward [30|. The ability of 
universal machines to efficiently simulate each other implies a comparable 
degree of robustness. There is a program p\ for the universal machine U 
that allows U to simulate U' . One can think of pi as a translator in U' 
for U. Let P2 be the shortest program producing s according to U' . Then 
there is a program for U made of p\ and P2 and generating s as an output. 
For strings from a certain length on, this theorem indicates that one will 
asymptotically approach the same complexity value regardless of the choice 
of universal Turing machine, provided the complexity will eventually tend 
toward oo. 

However, constants can make the calculation of C(s) for short stings 
profoundly dependent on the UTM used 3lj]. Both the problem of non- 



computability and the problems posed by short strings may account for the 
limited number of applications in psychology and other social sciences. 

2-4- Algorithmic probability 

Solomonoff [32|] had the idea of describing the likelihood of a UTM gener- 
ating a sequence with a randomly generated input program. Formalizing this 
idea, Levin defined the algorithmic probability of a string s as the probability 
that a random program (the bits of which are produced with a fair coin flip) 
would produce s running on a UTM U [33]. Formally Levin's measure is 
defined as: m(s) = J2{ P :U( P )=s} V 2lp| - 

For m(s) not to be greater than 1, U has, however, to be what is called 
a prefix-free Turing machine, that is, a machine that only accepts programs 
that are not the beginning of any other program (i.e. there exists an "END" 
sequence finishing any acceptable program). Levin's probability measure 
induces a distribution over programs producing s, assigning a greater chance 
that the shortest program is the one actually generating s. 

2.5. Algorithmic complexity for short strings 

The coding theorem connects algorithmic probability to algorithmic com- 
plexity [30j : 

Theorem 2 (coding theorem). For any finite string s and a prefix-free uni- 
versal Turing machine U , 

-log 2 (m(s))=Cu(s)+0(l) 
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The significance of this theorem lies in the fact that m(s) can be seen 
as the probability that a universal prefix- free UTM U output s. The cod- 
ing theorem provides a means to overcome the impossibility of computing 
C(s) for short strings, given that one can evaluate Cu(s) by calculating 
log 2 (m(s)), using a very natural UTM defined by a lexicographical enumer- 
ation of programs (programs enumerated by length). This idea has already 



been suggested by Delahaye and Zenil |25| as a means of estimating C(s) 
using m(s) as an alternative to the approximation of C(s) by compression 
means. To this end, they sampled the space of all possible Turing machines 
up to a certain size (2 symbols and 4 states). Given the result obtained 
by Turing mentioned above, enumerating all possible Turing machines and 
running them one by one is equivalent to using a single universal Turing 
machine and running it program by program. We denote by D(n) the prob- 
ability distribution of all 2-symbol n-state Turing machines. In other words, 
D[n) provides the production frequency of a string s among all 2-symbol 
n-state TM. 

A string is counted in D(n) if it is the output of a Turing machine, i.e. 
the content of the contiguous cells on the machine tape which the head has 
gone through before halting. 

2.6. Solving the halting problem for small machines 

Din), like m{s) and C(s), is a non-computable function, but one may 
use the results from a problem popular among computer scientists called the 
Busy Beaver problem. 

The Busy Beaver problem is the problem of finding a value S(n) for the 
maximum number of Is a TM with n states can produce before halting, 



when starting from an empty input. Rado [34j proves that finding £(n) for 
any n is impossible given the undecidability of the halting problem, but for 
n < 5 states and 2-symbol TMs, E(n) is known. These values are arrived 
at by essentially simulating all different n-state machines and proving that 
the remaining non-halting machines will never halt (which is possible only 
because the machines are small and simple). 

It is known that E(2) = 4, S(3) = 6 and E(4) = 13 - for 2-symbol TMs 
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These values of the Busy Beaver problem allowed Delahaye and Zenil [31 
to numerically calculate the complexity of strings in a natural way, using a 
universal and objective measure of randomness, without having to deal with 
the problems posed by the impact of additive constants resulting from the 
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choice of universal Turing machine. The tables produced by Zenil and Dela- 
haye are available online at the following URL: www.algorithmicnature.org 



3. An application 

The complexity of short sequences based on D(4) may be used to assess 
their randomness. We will now use it on a classical example, the Radio 
Zenith experiment data. In 1937, the Radio Zenith foundation carried out 
an experiment in telepathy: a set of senders picked series of 5 "heads or 
tails" (or equivalent), which the listeners had to guess. The results seemingly 



supported the telepathic hypothesis. However, Goodfellow |37j showed that 
the sequences used in this experiment were not random. For instance, the 
sequence 00000 or 11111 was used in less than 1% of the trials. The listeners 
showing the same biases as the senders, the data were shown not to support 
the paranormal hypothesis in the end. 

However, this experiment provides an important dataset for the study 
of the production of randomness in humans. When subjects try to guess 
the sequence in the Radio Zenith experiment, we may assume they try not 
to make it too obvious. Had they tried to be random, their answers would 



have been uniform - which is not the case. Griffiths and Tenenbaum [38 
hypothesized that subjects do not really try to be random, but instead try 
to maximize the probability that their answers will be random, in a Bayesian 
way: Let s be the answer (sequence), s may be produced by a rule or machine 
(event M), or be truly random (event R). What subjects try to maximize is 

P(R\ S ) - P ^ P ^ 



P(s\R)P(R) + P(s\M)P(M) ' 



Our complexity approach to this question provides a means to actually com- 
pute such a probability (see Table 1). 

A power function gives a good approximation of the link between P(R\s) 
and the Radio Zenith observed probability of s, RZ(s): RZ = P(i?|s) 4 ' 37 ; 
r = 0.736; F = 16.544; p < 0.001. These results support Griffiths and Tenen- 
baum's hypothesis, even though (1) many other factors (cultural, cognitive, 
etc.) may influence the responses, (2) we used a 4-state TM as a model for 
human cognition (instead of more complex Turing machines) and (3) 5-bit 
sequences are particularly short. The results also justify the D(4)-measure of 
complexity/randomness as a good alternative to the multiplication of tailor- 
made indices focusing on special features of sequences. 



10 



Sequence (aggregated) 


Zenith Radio data (%) 


P(R\s) according to 


00000 


0.84 


34.16 


00001 


1.39 


46.08 


00010 


3.73 


46.77 


00011 


4.48 


57.62 


00100 


4.99 


49.19 


00101 


14.23 


54.21 


00110 


11.82 


53.69 


00111 


5.66 


57.62 


01000 


3.22 


46.77 


01001 


8.68 


50.40 


01010 


4.34 


51.62 


01011 


5.7 


54.21 


01100 


10.9 


53.69 


01101 


11.66 


50.40 


oino 


6.48 


61.13 


01111 


1.95 


46.08 



Table 1: For each 5-bit sequence (00000 and 11111 are aggregated, as well as 00110 and 
11001, etc.), the table displays the percentage of people who produced the sequence, and 
the probability that the sequence is random, in a complexity approach using D(A). 
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Other potential applications of the randomness assessment based on D(4) 
include (1) the study of classical random-generation biases: Alternation bias, 
or cycling bias, may well turn out to be a more general complexity bias (over- 
representation of complex sequences in human pseudo-random sequences) 
(2) the development of a numerical measure of attention through random 
generation tasks. 
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